Utility preserving query log anonymization via semantic microaggregation

نویسندگان

  • Montserrat Batet
  • Arnau Erola
  • David Sánchez
  • Jordi Castellà-Roca
چکیده

Query logs are of great interest for scientists and companies for research, statistical and commercial purposes. However, the availability of query logs for secondary uses raises privacy issues since they allow the identification and/or revelation of sensitive information about individual users. Hence, query anonymization is crucial to avoid identity disclosure. To enable the publication of privacy-preserved -but still usefulquery logs, in this paper, we present an anonymization method based on semantic microaggregation. Our proposal aims at minimizing the disclosure risk of anonymized query logs while retaining their semantics as much as possible. First, a method to map queries to their formal semantics extracted from the structured categories of the Open Directory Project is presented. Then, a microaggregation method is adapted to perform a semantically-grounded anonymization of query logs. To do so, appropriate semantic similarity and semantic aggregation functions are proposed. Experiments performed using real AOL query logs show that our proposal better retains the utility of anonymized query logs than other related works, while also minimizing the disclosure risk.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Utility-Preserving Differentially Private Data Releases Via Individual Ranking Microaggregation

Being able to release and exploit open data gathered in information systems is crucial for researchers, enterprises and the overall society. Yet, these data must be anonymized before release to protect the privacy of the subjects to whom the records relate. Differential privacy is a privacy model for anonymization that offers more robust privacy guarantees than previous models, such as k-anonym...

متن کامل

Semantic microaggregation for the anonymization of query logs using the open directory project

Web search engines gather information from the queries performed by the user in the form of query logs. These logs are extremely useful for research, marketing, or profiling, but at the same time they are a great threat to the user’s privacy. We provide a novel approach to anonymize query logs so they ensure user k-anonymity, by extending a common method used in statistical disclosure control: ...

متن کامل

Privacy Preserving Web Query Log Publishing: A Survey on Anonymization Techniques

Releasing Web query logs which contain valuable information for research or marketing, can breach the privacy of search engine users. Therefore rendering query logs to limit linking a query to an individual while preserving the data usefulness for analysis, is an important research problem. This survey provides an overview and discussion on the recent studies on this direction.

متن کامل

Analysis of User query refinement behavior based on semantic features: user log analysis of Ganj database (IranDoc)

Background and Aim: Information systems cannot be well designed or developed without a clear understanding of needs of users, manner of their information seeking and evaluating. This research has been designed to analyze the Ganj (Iranian research institute of science and technology database) users’ query refinement behaviors via log analysis.    Methods: The method of this research is log anal...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Sci.

دوره 242  شماره 

صفحات  -

تاریخ انتشار 2013